-
Notifications
You must be signed in to change notification settings - Fork 2
feat(algorithms, heaps): top k and kth largest #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdded two heap algorithm features: Top K Largest (three functions) and K Closest Points to Origin (heap and sorting implementations), plus README documentation and unit tests for both; updated Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@algorithms/heap/topklargest/__init__.py`:
- Around line 72-76: The in-place call nums.sort(reverse=True) mutates the
caller's list; change it to operate on a non-mutating sorted sequence (e.g., use
sorted(nums, reverse=True) or sort a shallow copy) and then return the k-th
largest from that result (replace the nums.sort and return nums[k - 1] usage
with a non-mutating sorted result variable and return that_result[k - 1]);
ensure references to nums.sort(reverse=True) and the subsequent return are
updated accordingly.
- Around line 5-27: k_largest lacks input validation and can IndexError on empty
inputs or invalid k; add guards at the start of k_largest to validate params
(nums and k) consistent with the other variants: if not nums or k <= 0 or k >
len(nums) raise a ValueError with a clear message (e.g., "k must be between 1
and len(nums) and nums must be non-empty"); perform this validation before
building min_heap and iterating.
🧹 Nitpick comments (2)
algorithms/heap/topklargest/__init__.py (1)
30-56: Implementation is correct; consider exception vs sentinel for invalid input.The heap-based algorithm is correctly implemented with O(n log k) time complexity. One consideration: returning
-1for invalid input could be ambiguous if-1is a valid element in the array. An alternative would be to raise aValueError, which is more Pythonic for invalid arguments. However, this is consistent withkth_largest_sorting, so the current approach is acceptable if documented.algorithms/heap/topklargest/test_top_k_largest_elements.py (1)
6-27: Consider adding edge case tests for invalid inputs.The current test cases cover happy paths well, but consider adding tests for:
- Empty list:
([], 1, -1)forkth_largest/kth_largest_sortingk=0:([1,2,3], 0, -1)k > len(nums):([1,2], 5, -1)This would validate the input validation logic and document expected behavior.
📝 Suggested additional test cases
KTH_LARGEST_EDGE_CASES = [ ([], 1, -1), # empty list ([1, 2, 3], 0, -1), # k = 0 ([1, 2], 5, -1), # k > len(nums) ]
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (16)
algorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_1.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_10.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_11.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_2.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_3.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_4.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_5.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_6.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_7.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_8.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/kth_largest_element_in_array_solution_9.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/top_k_largest_element_in_array_solution_1.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/top_k_largest_element_in_array_solution_2.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/top_k_largest_element_in_array_solution_3.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/top_k_largest_element_in_array_solution_4.pngis excluded by!**/*.pngalgorithms/heap/topklargest/images/solutions/top_k_largest_element_in_array_solution_5.pngis excluded by!**/*.png
📒 Files selected for processing (6)
DIRECTORY.mdalgorithms/heap/topkfrequentelements/README.mdalgorithms/heap/topkfrequentelements/__init__.pyalgorithms/heap/topklargest/README.mdalgorithms/heap/topklargest/__init__.pyalgorithms/heap/topklargest/test_top_k_largest_elements.py
🧰 Additional context used
🧬 Code graph analysis (1)
algorithms/heap/topklargest/test_top_k_largest_elements.py (1)
algorithms/heap/topklargest/__init__.py (3)
k_largest(5-27)kth_largest(30-56)kth_largest_sorting(59-76)
🪛 LanguageTool
algorithms/heap/topklargest/README.md
[style] ~84-~84: Using many exclamation marks might seem excessive (in this case: 11 exclamation marks for a text that’s 3674 characters long)
Context: ...argest_element_in_array_solution_1.png)

🪛 markdownlint-cli2 (0.18.1)
DIRECTORY.md
143-143: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
144-144: Unordered list indentation
Expected: 4; Actual: 6
(MD007, ul-indent)
🔇 Additional comments (4)
algorithms/heap/topklargest/README.md (2)
1-46: Documentation is well-structured and accurate.The explanation of the min-heap approach for Top-K elements is clear, with appropriate complexity analysis. The note about variable k complexity is helpful for understanding the general case.
49-106: LGTM!The documentation for the Kth Largest Element problem is comprehensive, covering both sorting and min-heap approaches with accurate complexity analysis.
algorithms/heap/topklargest/test_top_k_largest_elements.py (1)
12-16: LGTM!Good use of
sorted()comparison fork_largestsince the heap doesn't guarantee order within the result set.DIRECTORY.md (1)
143-144: LGTM!The new entry for
Topklargestfollows the existing indentation pattern in the file. The static analysis warning about indentation can be safely ignored as the style is consistent with other entries in the Heap section (e.g., Schedule Tasks on lines 141-142).
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@algorithms/heap/topkclosesttoorigin/__init__.py`:
- Around line 31-36: k_closest_to_origin_sorting currently mutates the input by
calling points.sort(); change it to avoid side effects by sorting a copy (use
sorted(points, key=...)) or by copying first (points_copy = list(points);
points_copy.sort(...)) and then return the first k from the sorted copy so the
original input remains unchanged, matching the non-mutating behavior of
k_closest_to_origin.
In `@algorithms/heap/topkclosesttoorigin/README.md`:
- Around line 6-8: The distance formula in the README is garbled; replace the
incorrect expression with correct mathematical notation: for distance from
origin use "sqrt(x² + y²)" (or "√(x² + y²)"), and for distance between two
points use "sqrt((x₁ - x₂)² + (y₁ - y₂)²)" (or "√((x₁ - x₂)² + (y₁ - y₂)²)"),
ensuring variable names match those used elsewhere in the doc (e.g., x,y and a,b
or x₁,y₁ and x₂,y₂) and remove the malformed `√(x1 - a2)2 + (y1 - b2)2` text.
In `@algorithms/heap/topkclosesttoorigin/test_top_k_closest_to_origin.py`:
- Around line 23-25: The test sorts results only by x-coordinate which can be
flaky when x values tie; update the sorting key in
test_top_k_closest_to_origin.py to use both coordinates (e.g., key that returns
a tuple of (x, y)) for sorted_expected and sorted_actual in the block using
sorted_expected/sorted_actual and apply the same change in
test_top_k_closest_to_origin_sorting so comparisons are deterministic.
🧹 Nitpick comments (1)
algorithms/heap/topkclosesttoorigin/__init__.py (1)
5-28: Consider adding edge case validation.The heap-based implementation is correct and efficient. However, there's no validation for edge cases:
k <= 0would return an empty list (acceptable but undocumented)k > len(points)would return all points (acceptable)- Empty
pointslist would return empty (acceptable)These behaviors may be intentional, but documenting them or adding explicit validation would improve robustness.
🔧 Optional: Add input validation
def k_closest_to_origin(points: List[List[int]], k: int) -> List[List[int]]: + if not points or k <= 0: + return [] + k = min(k, len(points)) + # Max heap will store the top k points closest to the origin in the form (-distance, idx). The distance is negated
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (13)
algorithms/heap/topkclosesttoorigin/images/examples/top_k_closest_to_origin_example_1.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_0.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_1.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_10.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_11.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_2.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_3.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_4.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_5.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_6.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_7.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_8.pngis excluded by!**/*.pngalgorithms/heap/topkclosesttoorigin/images/solutions/top_k_closest_to_origin_solution_9.pngis excluded by!**/*.png
📒 Files selected for processing (4)
DIRECTORY.mdalgorithms/heap/topkclosesttoorigin/README.mdalgorithms/heap/topkclosesttoorigin/__init__.pyalgorithms/heap/topkclosesttoorigin/test_top_k_closest_to_origin.py
🧰 Additional context used
🧬 Code graph analysis (1)
algorithms/heap/topkclosesttoorigin/test_top_k_closest_to_origin.py (1)
algorithms/heap/topkclosesttoorigin/__init__.py (2)
k_closest_to_origin(5-28)k_closest_to_origin_sorting(31-36)
🪛 LanguageTool
algorithms/heap/topkclosesttoorigin/README.md
[style] ~82-~82: Using many exclamation marks might seem excessive (in this case: 10 exclamation marks for a text that’s 3221 characters long)
Context: ...top_k_closest_to_origin_solution_0.png)

🪛 markdownlint-cli2 (0.18.1)
DIRECTORY.md
143-143: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
144-144: Unordered list indentation
Expected: 4; Actual: 6
(MD007, ul-indent)
145-145: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
146-146: Unordered list indentation
Expected: 4; Actual: 6
(MD007, ul-indent)
🔇 Additional comments (3)
DIRECTORY.md (1)
143-146: LGTM!The new directory entries for
TopkclosesttooriginandTopklargestare correctly added and follow the existing indentation pattern used throughout the file. The static analysis hints about indentation (MD007) are false positives since the file consistently uses 4-space and 6-space indentation for nested list items.algorithms/heap/topkclosesttoorigin/README.md (1)
56-104: LGTM!The algorithm explanation and complexity analysis are well-written and accurate. The cross-reference to the related
topklargestimplementation provides helpful context.algorithms/heap/topkclosesttoorigin/test_top_k_closest_to_origin.py (1)
9-13: Test coverage is adequate for core functionality.The test cases cover the basic scenarios. Consider adding edge cases in future iterations:
k = 0k > len(points)- Points with identical distances to origin
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
algorithms/heap/topkclosesttoorigin/test_top_k_closest_to_origin.py
Outdated
Show resolved
Hide resolved
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…in.py Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Describe your change:
Checklist:
Fixes: #{$ISSUE_NO}.Summary by CodeRabbit
New Features
Documentation
Tests
✏️ Tip: You can customize this high-level summary in your review settings.